An Algorithm for Locating Fundamental Frequency (f0) Markers in Speech

نویسندگان

  • Princy Dikshit
  • Vijayan K. Asari
  • Stephen A. Zahorian
چکیده

AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH Princy Dikshit Old Dominion University, December 2004 Director: Dr. Stephen A. Zahorian Speech has been the principal form of human communication since it began to evolve at least one hundred thousand years ago. Speech is produced by vibrations of the vocal cords. The rate of vibration of the cords is called fundamental frequency (F0) or pitch. The objective of this thesis is to locate pitch period cycles on a cycle-by-cycle basis. The complexity in identifying pitch cycles stems from the highly irregular nature of human speech. Dynamic programming is used to combine two sources of information for pitch period marking. One source of information is the "local" information corresponding to the location and amplitude of peaks in the acoustic speech signal. The other source of information is the "transition" information corresponding to the relative closeness of the distance between the signal peaks to the expected pitch period values. The expected pitch period values are obtained from a pitch tracker (YAPT) or from the reference pitch track. The Keele speech database was used for testing purposes. Over 95% of the identified pitch cycles were within a 1ms deviation of the actual pitch cycles in experiment using clean speech signals. In experiments with noisy speech signals, an accuracy rate of 92% and above was observed for an SNR range of 30db to 5db. In an experiment evaluating the robustness of the algorithm vis-à-vis errors in the pitch track using clean studio quality signals, an accuracy rate of 95% was obtained for an error range of -10% to +60% in pitch. The algorithm generated = 1% extra markers (false positives) for clean studio quality (pitch track error range of -10% to +60%) and noisy speech signals (SNR range of 30db to 5db). The use of the pitch track generated by the ODU pitch tracker (YAPT) for identifying pitch markers gave an accuracy rate of 95% as compared to 93% obtained using the reference pitch track supplied with the Keele database. A preliminary test on telephone quality signals gave an accuracy rate of 63%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مشکلات جداسازی اصوات گفتاری همزمان در کودکان کم شنوا

Objective: This study was a basic investigation of the ability of concurrent speech segregation in hearing impaired children. Concurrent segregation is one of the fundamental components of auditory scene analysis and plays an important role in speech perception. In the present study, we compared auditory late responses or ALRs between hearing impaired and normal children. Materials & Methods...

متن کامل

Predicting gradient F0 variation: pitch range and accent prominence

Many aspects of prosody prediction in speech synthesis could be improved, from placement of symbolic accent and phrase boundary markers to control of continuously varying parameters (e.g., duration, fundamental frequency). The goal of this work is to develop algorithms for predicting aspects of fundamental frequency typically said to have gradient variation: pitch range and prominence. In addit...

متن کامل

Vae-space: Deep Generative Model of Voice Fundamental Frequency Contours

Modeling the speech generation process can provide flexible and interpretable ways to generate intended synthetic speech. In this paper, we present a deep generative model of fundamental frequency (F0) contours of normal speech and singing voices. The generative model we propose in this paper 1) is able to accurately decompose an F0 contour into the sum of phrase and accent components of the Fu...

متن کامل

Harvest: A High-Performance Fundamental Frequency Estimator from Speech Signals

A fundamental frequency (F0) estimator named Harvest is described. The unique points of Harvest are that it can obtain a reliable F0 contour and reduce the error that the voiced section is wrongly identified as the unvoiced section. It consists of two steps: estimation of F0 candidates and generation of a reliable F0 contour on the basis of these candidates. In the first step, the algorithm use...

متن کامل

A Statistical Phrase/Accent Model for Intonation Modeling

This paper proposes a statistical phrase/accent model of voice fundamental frequency(F0) for speech synthesis. It presents an approach for automatic extraction and modeling of phrase and accent phenomena from F0 contours by taking into account their overall trends in the training data. An iterative optimization algorithm is described to extract these components, minimizing the reconstruction er...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004